added multicolumn aggregation to DBA and improved three essential parts which suffer from many chunks #1069

s3inlc · 2024-06-20T12:08:28Z

This PR solves some of the issues with loading/handling of systems with large amount of tasks or larger number of agents:

When creating chunks, so far there was a global lock file in place which was locked no matter for which task a chunk was about to be created. It is enough to lock by each task, as creating two chunks for two different tasks at the same time poses no problem.
When having active tasks with very large amount of chunks, the chunk assignment on the server may take so long, that the agent thinks the connection has timed out (30s). This can lead to situations where all agents end up in the loop of requesting chunks/tasks and the server is not able to respond fast enough anymore due to an active task it has to check where it looped over all chunks (which obviously took time the larger the number of chunks). This is solved by the changes in TaskUtils where instead of looping over all existing chunks, it is only looping over the non-completed ones and uses SUM in sql to determine the other needed values.
In general there are a lot of places in the code, where multiple queries are done summing/counting/maxing over columns (or even worse, in most of the places, the entries are loaded and looped over in the code). When having a lot of chunks, the biggest issue happened in the getTaskInfo() function, which is used by the current old UI, causing very long loading times of the tasks page if tasks with many chunks were listed. In order to tackle this, the DBA was extended slightly to be able to sum/count/max/.. over multiple columns with the same query (as long as the where conditions were the same for all them). In the specific case of the getTaskInfo() function, this reduced the loading times by 5-6 times approximately.

The DBA function change was already adapted for the getChunkInfo() function as well, but there may be many places still around where potentially using either a aggregation function alone or using the newly added multicolumn aggregation could give potential additional speedups.

… then 1024 bytes

Adding loops to scan through lines to support importing hashes longer then 1024 bytes

…ts which suffer from many chunks

jessevz · 2024-10-14T08:06:52Z

src/inc/Util.class.php

+
+    $totalTimeSpent = $results[$agg1->getName()] - $results[$agg2->getName()];
+
+    $agg1 = new Aggregation(Chunk::CHECKPOINT, "SUM");


Instead of using magic strings like "MAX", "SUM" and "COUNT", it is probably nicer to create constants inside the Aggregation class and use that.

jessevz · 2024-10-14T08:22:39Z

src/inc/Util.class.php

+    $qF3 = new QueryFilter(Chunk::SOLVE_TIME, 0, ">");
+    $agg1 = new Aggregation(Chunk::SOLVE_TIME, "SUM");
+    $agg2 = new Aggregation(Chunk::DISPATCH_TIME, "SUM");
+    $results = Factory::getChunkFactory()->multicolAggregationFilter([Factory::FILTER => [$qF1, $qF2, $qF3]], [$agg1, $agg2]);


isn't this query redundant and could you get all information from the query on line 399, if we just add agg1 and agg2 to that query? I understand that the query on this line also contains filters for when the dispatch time and solve time are bigger than 0, but since you aggregate them anyway with a sum, it doesn't really matter because the values that are 0 have no effect on the sum. By getting all values out of the query on line 399 we reduce the load on the database.

zyronix and others added 3 commits April 22, 2024 10:48

Adding loops to scan through lines to support importing hashes longer…

125e283

… then 1024 bytes

Merge pull request #1062 from hashtopolis/bug/1061

087bd05

Adding loops to scan through lines to support importing hashes longer then 1024 bytes

added multicolumn aggregation to DBA and improved three essential par…

92fcc6e

…ts which suffer from many chunks

s3inlc requested a review from zyronix June 20, 2024 12:08

s3inlc changed the base branch from master to dev September 27, 2024 13:46

s3inlc requested review from jessevz and removed request for zyronix September 27, 2024 13:47

jessevz reviewed Oct 14, 2024

View reviewed changes

jessevz self-assigned this Oct 14, 2024

ObsidianOracle assigned s3inlc Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added multicolumn aggregation to DBA and improved three essential parts which suffer from many chunks #1069

added multicolumn aggregation to DBA and improved three essential parts which suffer from many chunks #1069

s3inlc commented Jun 20, 2024

jessevz Oct 14, 2024

jessevz Oct 14, 2024


		$totalTimeSpent = $results[$agg1->getName()] - $results[$agg2->getName()];

		$agg1 = new Aggregation(Chunk::CHECKPOINT, "SUM");

added multicolumn aggregation to DBA and improved three essential parts which suffer from many chunks #1069

Are you sure you want to change the base?

added multicolumn aggregation to DBA and improved three essential parts which suffer from many chunks #1069

Conversation

s3inlc commented Jun 20, 2024

jessevz Oct 14, 2024

Choose a reason for hiding this comment

jessevz Oct 14, 2024

Choose a reason for hiding this comment